Overview

Dataset statistics

Number of variables9
Number of observations17000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.2 MiB
Average record size in memory72.0 B

Variable types

Numeric9

Warnings

longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation
median_house_value is highly correlated with median_incomeHigh correlation
longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation
median_house_value is highly correlated with median_incomeHigh correlation
longitude is highly correlated with latitudeHigh correlation
latitude is highly correlated with longitudeHigh correlation
total_rooms is highly correlated with total_bedrooms and 2 other fieldsHigh correlation
total_bedrooms is highly correlated with total_rooms and 2 other fieldsHigh correlation
population is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
households is highly correlated with total_rooms and 2 other fieldsHigh correlation
total_rooms is highly correlated with households and 2 other fieldsHigh correlation
longitude is highly correlated with median_house_value and 1 other fieldsHigh correlation
median_house_value is highly correlated with longitude and 2 other fieldsHigh correlation
latitude is highly correlated with longitude and 1 other fieldsHigh correlation
total_bedrooms is highly correlated with households and 2 other fieldsHigh correlation
population is highly correlated with households and 2 other fieldsHigh correlation
median_income is highly correlated with median_house_valueHigh correlation

Reproduction

Analysis started2021-08-15 01:58:00.109835
Analysis finished2021-08-15 01:58:25.870587
Duration25.76 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

longitude
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct827
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-119.5621082
Minimum-124.35
Maximum-114.31
Zeros0
Zeros (%)0.0%
Negative17000
Negative (%)100.0%
Memory size132.9 KiB
2021-08-15T10:58:26.088784image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-124.35
5-th percentile-122.47
Q1-121.79
median-118.49
Q3-118
95-th percentile-117.07
Maximum-114.31
Range10.04
Interquartile range (IQR)3.79

Descriptive statistics

Standard deviation2.005166408
Coefficient of variation (CV)-0.0167709188
Kurtosis-1.322329668
Mean-119.5621082
Median Absolute Deviation (MAD)1.28
Skewness-0.3040029768
Sum-2032555.84
Variance4.020692325
MonotonicityDecreasing
2021-08-15T10:58:26.433698image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-118.31136
 
0.8%
-118.3128
 
0.8%
-118.32124
 
0.7%
-118.29118
 
0.7%
-118.35116
 
0.7%
-118.36115
 
0.7%
-118.27114
 
0.7%
-118.28113
 
0.7%
-118.37111
 
0.7%
-118.19110
 
0.6%
Other values (817)15815
93.0%
ValueCountFrequency (%)
-124.351
 
< 0.1%
-124.32
 
< 0.1%
-124.271
 
< 0.1%
-124.261
 
< 0.1%
-124.251
 
< 0.1%
-124.233
< 0.1%
-124.221
 
< 0.1%
-124.213
< 0.1%
-124.194
< 0.1%
-124.185
< 0.1%
ValueCountFrequency (%)
-114.311
 
< 0.1%
-114.471
 
< 0.1%
-114.561
 
< 0.1%
-114.572
< 0.1%
-114.582
< 0.1%
-114.592
< 0.1%
-114.63
< 0.1%
-114.612
< 0.1%
-114.631
 
< 0.1%
-114.653
< 0.1%

latitude
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct840
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.62522471
Minimum32.54
Maximum41.95
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size132.9 KiB
2021-08-15T10:58:26.763853image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum32.54
5-th percentile32.82
Q133.93
median34.25
Q337.72
95-th percentile38.96
Maximum41.95
Range9.41
Interquartile range (IQR)3.79

Descriptive statistics

Standard deviation2.137339795
Coefficient of variation (CV)0.05999512459
Kurtosis-1.112226493
Mean35.62522471
Median Absolute Deviation (MAD)1.2
Skewness0.4718011204
Sum605628.82
Variance4.568221398
MonotonicityNot monotonic
2021-08-15T10:58:27.070200image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34.06205
 
1.2%
34.08200
 
1.2%
34.05196
 
1.2%
34.07194
 
1.1%
34.04188
 
1.1%
34.09178
 
1.0%
34.1171
 
1.0%
34.02169
 
1.0%
34.03162
 
1.0%
33.94146
 
0.9%
Other values (830)15191
89.4%
ValueCountFrequency (%)
32.541
 
< 0.1%
32.553
 
< 0.1%
32.569
0.1%
32.5713
0.1%
32.5820
0.1%
32.599
0.1%
32.67
 
< 0.1%
32.6110
0.1%
32.6210
0.1%
32.6318
0.1%
ValueCountFrequency (%)
41.952
< 0.1%
41.881
 
< 0.1%
41.863
< 0.1%
41.841
 
< 0.1%
41.821
 
< 0.1%
41.812
< 0.1%
41.82
< 0.1%
41.791
 
< 0.1%
41.783
< 0.1%
41.771
 
< 0.1%

housing_median_age
Real number (ℝ≥0)

Distinct52
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.58935294
Minimum1
Maximum52
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size132.9 KiB
2021-08-15T10:58:27.375789image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile8
Q118
median29
Q337
95-th percentile52
Maximum52
Range51
Interquartile range (IQR)19

Descriptive statistics

Standard deviation12.58693698
Coefficient of variation (CV)0.4402665918
Kurtosis-0.8008262247
Mean28.58935294
Median Absolute Deviation (MAD)10
Skewness0.06489403293
Sum486019
Variance158.4309826
MonotonicityNot monotonic
2021-08-15T10:58:27.702844image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
521052
 
6.2%
36715
 
4.2%
35692
 
4.1%
16635
 
3.7%
17576
 
3.4%
34567
 
3.3%
33513
 
3.0%
26503
 
3.0%
18478
 
2.8%
25461
 
2.7%
Other values (42)10808
63.6%
ValueCountFrequency (%)
12
 
< 0.1%
249
 
0.3%
346
 
0.3%
4161
0.9%
5199
1.2%
6129
0.8%
7151
0.9%
8178
1.0%
9172
1.0%
10226
1.3%
ValueCountFrequency (%)
521052
6.2%
5132
 
0.2%
50112
 
0.7%
49111
 
0.7%
48135
 
0.8%
47175
 
1.0%
46196
 
1.2%
45235
 
1.4%
44296
 
1.7%
43286
 
1.7%

total_rooms
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5533
Distinct (%)32.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2643.664412
Minimum2
Maximum37937
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size132.9 KiB
2021-08-15T10:58:28.019047image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile626.95
Q11462
median2127
Q33151.25
95-th percentile6269.05
Maximum37937
Range37935
Interquartile range (IQR)1689.25

Descriptive statistics

Standard deviation2179.947071
Coefficient of variation (CV)0.8245929634
Kurtosis29.51588478
Mean2643.664412
Median Absolute Deviation (MAD)792
Skewness4.002729999
Sum44942295
Variance4752169.234
MonotonicityNot monotonic
2021-08-15T10:58:28.326418image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
158216
 
0.1%
152715
 
0.1%
147114
 
0.1%
171714
 
0.1%
170314
 
0.1%
172413
 
0.1%
161313
 
0.1%
205313
 
0.1%
201712
 
0.1%
235512
 
0.1%
Other values (5523)16864
99.2%
ValueCountFrequency (%)
21
 
< 0.1%
81
 
< 0.1%
111
 
< 0.1%
121
 
< 0.1%
152
< 0.1%
183
< 0.1%
202
< 0.1%
221
 
< 0.1%
242
< 0.1%
251
 
< 0.1%
ValueCountFrequency (%)
379371
< 0.1%
326271
< 0.1%
320541
< 0.1%
304051
< 0.1%
304011
< 0.1%
282581
< 0.1%
277001
< 0.1%
263221
< 0.1%
259571
< 0.1%
251871
< 0.1%

total_bedrooms
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1848
Distinct (%)10.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean539.4108235
Minimum1
Maximum6445
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size132.9 KiB
2021-08-15T10:58:28.651333image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile138
Q1297
median434
Q3648.25
95-th percentile1283
Maximum6445
Range6444
Interquartile range (IQR)351.25

Descriptive statistics

Standard deviation421.4994516
Coefficient of variation (CV)0.7814071079
Kurtosis19.69275009
Mean539.4108235
Median Absolute Deviation (MAD)162
Skewness3.322636716
Sum9169984
Variance177661.7877
MonotonicityNot monotonic
2021-08-15T10:58:28.971184image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28048
 
0.3%
30944
 
0.3%
34343
 
0.3%
39443
 
0.3%
33143
 
0.3%
34543
 
0.3%
32241
 
0.2%
29041
 
0.2%
34041
 
0.2%
27241
 
0.2%
Other values (1838)16572
97.5%
ValueCountFrequency (%)
11
 
< 0.1%
21
 
< 0.1%
34
< 0.1%
46
< 0.1%
54
< 0.1%
65
< 0.1%
74
< 0.1%
82
 
< 0.1%
97
< 0.1%
108
< 0.1%
ValueCountFrequency (%)
64451
< 0.1%
54711
< 0.1%
52901
< 0.1%
49571
< 0.1%
49521
< 0.1%
48191
< 0.1%
47981
< 0.1%
44921
< 0.1%
44571
< 0.1%
44071
< 0.1%

population
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3683
Distinct (%)21.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1429.573941
Minimum3
Maximum35682
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size132.9 KiB
2021-08-15T10:58:29.300753image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile350.95
Q1790
median1167
Q31721
95-th percentile3297.05
Maximum35682
Range35679
Interquartile range (IQR)931

Descriptive statistics

Standard deviation1147.852959
Coefficient of variation (CV)0.8029336057
Kurtosis80.86199702
Mean1429.573941
Median Absolute Deviation (MAD)437.5
Skewness5.187211878
Sum24302757
Variance1317566.416
MonotonicityNot monotonic
2021-08-15T10:58:29.621291image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
89123
 
0.1%
105222
 
0.1%
85019
 
0.1%
122719
 
0.1%
92619
 
0.1%
76119
 
0.1%
81019
 
0.1%
65818
 
0.1%
80418
 
0.1%
78118
 
0.1%
Other values (3673)16806
98.9%
ValueCountFrequency (%)
31
 
< 0.1%
61
 
< 0.1%
82
< 0.1%
92
< 0.1%
111
 
< 0.1%
134
< 0.1%
141
 
< 0.1%
152
< 0.1%
172
< 0.1%
182
< 0.1%
ValueCountFrequency (%)
356821
< 0.1%
285661
< 0.1%
161221
< 0.1%
155071
< 0.1%
150371
< 0.1%
132511
< 0.1%
128731
< 0.1%
124271
< 0.1%
122031
< 0.1%
121531
< 0.1%

households
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1740
Distinct (%)10.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean501.2219412
Minimum1
Maximum6082
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size132.9 KiB
2021-08-15T10:58:29.969196image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile126
Q1282
median409
Q3605.25
95-th percentile1172.1
Maximum6082
Range6081
Interquartile range (IQR)323.25

Descriptive statistics

Standard deviation384.5208409
Coefficient of variation (CV)0.7671668163
Kurtosis20.69264455
Mean501.2219412
Median Absolute Deviation (MAD)150
Skewness3.342668363
Sum8520773
Variance147856.2771
MonotonicityNot monotonic
2021-08-15T10:58:30.504745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
38648
 
0.3%
30648
 
0.3%
28247
 
0.3%
33046
 
0.3%
42645
 
0.3%
38044
 
0.3%
33544
 
0.3%
32943
 
0.3%
28443
 
0.3%
31643
 
0.3%
Other values (1730)16549
97.3%
ValueCountFrequency (%)
11
 
< 0.1%
22
 
< 0.1%
32
 
< 0.1%
44
< 0.1%
57
< 0.1%
64
< 0.1%
76
< 0.1%
86
< 0.1%
94
< 0.1%
106
< 0.1%
ValueCountFrequency (%)
60821
< 0.1%
51891
< 0.1%
50501
< 0.1%
47691
< 0.1%
46161
< 0.1%
44901
< 0.1%
43721
< 0.1%
43391
< 0.1%
42041
< 0.1%
40721
< 0.1%

median_income
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct11175
Distinct (%)65.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.8835781
Minimum0.4999
Maximum15.0001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size132.9 KiB
2021-08-15T10:58:30.822834image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0.4999
5-th percentile1.603395
Q12.566375
median3.5446
Q34.767
95-th percentile7.36447
Maximum15.0001
Range14.5002
Interquartile range (IQR)2.200625

Descriptive statistics

Standard deviation1.908156518
Coefficient of variation (CV)0.4913398081
Kurtosis4.76414493
Mean3.8835781
Median Absolute Deviation (MAD)1.07405
Skewness1.626693098
Sum66020.8277
Variance3.641061299
MonotonicityNot monotonic
2021-08-15T10:58:31.117534image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.12541
 
0.2%
4.12539
 
0.2%
2.87539
 
0.2%
15.000138
 
0.2%
2.62536
 
0.2%
3.87533
 
0.2%
3.62531
 
0.2%
331
 
0.2%
4.37530
 
0.2%
3.37528
 
0.2%
Other values (11165)16654
98.0%
ValueCountFrequency (%)
0.499911
0.1%
0.5367
< 0.1%
0.64331
 
< 0.1%
0.67751
 
< 0.1%
0.68251
 
< 0.1%
0.68311
 
< 0.1%
0.6961
 
< 0.1%
0.69911
 
< 0.1%
0.70071
 
< 0.1%
0.70251
 
< 0.1%
ValueCountFrequency (%)
15.000138
0.2%
151
 
< 0.1%
14.90091
 
< 0.1%
14.58331
 
< 0.1%
14.42191
 
< 0.1%
14.41131
 
< 0.1%
14.29591
 
< 0.1%
13.9471
 
< 0.1%
13.85561
 
< 0.1%
13.80931
 
< 0.1%

median_house_value
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3694
Distinct (%)21.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean207300.9124
Minimum14999
Maximum500001
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size132.9 KiB
2021-08-15T10:58:31.423134image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum14999
5-th percentile66000
Q1119400
median180400
Q3265000
95-th percentile495500
Maximum500001
Range485002
Interquartile range (IQR)145600

Descriptive statistics

Standard deviation115983.7644
Coefficient of variation (CV)0.5594947126
Kurtosis0.3039975986
Mean207300.9124
Median Absolute Deviation (MAD)68800
Skewness0.9730366335
Sum3524115510
Variance1.34522336 × 1010
MonotonicityNot monotonic
2021-08-15T10:58:31.729654image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
500001814
 
4.8%
13750095
 
0.6%
16250089
 
0.5%
11250085
 
0.5%
18750074
 
0.4%
22500073
 
0.4%
35000064
 
0.4%
8750064
 
0.4%
15000052
 
0.3%
27500051
 
0.3%
Other values (3684)15539
91.4%
ValueCountFrequency (%)
149994
< 0.1%
175001
 
< 0.1%
225003
< 0.1%
250001
 
< 0.1%
266001
 
< 0.1%
269001
 
< 0.1%
275001
 
< 0.1%
283001
 
< 0.1%
300002
< 0.1%
325003
< 0.1%
ValueCountFrequency (%)
500001814
4.8%
50000022
 
0.1%
4991001
 
< 0.1%
4990001
 
< 0.1%
4988001
 
< 0.1%
4987001
 
< 0.1%
4986001
 
< 0.1%
4984001
 
< 0.1%
4974001
 
< 0.1%
4964002
 
< 0.1%

Interactions

2021-08-15T10:58:01.376707image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:01.671576image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:01.981027image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:02.510135image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:02.861644image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:03.160132image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:03.495366image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:03.753807image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:04.037383image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:04.313494image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:04.554610image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:04.788946image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:05.043558image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:05.300025image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:05.576128image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:05.837038image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:06.095212image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:06.355486image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:06.607366image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:06.889045image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:07.134610image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:07.394650image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:07.679124image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:08.002529image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:08.289377image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:08.563502image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:08.828870image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:09.106200image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:09.380591image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:09.654356image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:09.930230image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:10.216326image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:10.495958image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:10.916963image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:11.196352image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:11.482652image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:11.766221image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:12.031295image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:12.285184image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:12.545586image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:12.823269image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:13.098729image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:13.372215image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:13.636724image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:13.906992image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:14.175482image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:14.458759image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:14.757841image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:15.076176image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:15.395158image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:15.708836image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:16.024381image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:16.348191image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:16.676087image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:16.978352image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:17.266497image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:17.555374image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:17.840584image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:18.139843image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:18.423733image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:18.719778image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:19.007344image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:19.292707image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:19.573838image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:19.859303image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:20.131473image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:20.420396image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:20.723609image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:21.021389image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:21.511934image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:21.807760image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:22.104557image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:22.394686image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:22.666720image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:22.929510image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:23.204958image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:23.495452image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:23.778786image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:24.069468image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:24.345216image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-08-15T10:58:24.631440image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-08-15T10:58:31.992538image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-08-15T10:58:32.385623image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-08-15T10:58:32.775149image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-08-15T10:58:33.165946image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-08-15T10:58:25.136331image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-08-15T10:58:25.636716image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_value
0-114.3134.1915.05612.01283.01015.0472.01.493666900.0
1-114.4734.4019.07650.01901.01129.0463.01.820080100.0
2-114.5633.6917.0720.0174.0333.0117.01.650985700.0
3-114.5733.6414.01501.0337.0515.0226.03.191773400.0
4-114.5733.5720.01454.0326.0624.0262.01.925065500.0
5-114.5833.6329.01387.0236.0671.0239.03.343874000.0
6-114.5833.6125.02907.0680.01841.0633.02.676882400.0
7-114.5934.8341.0812.0168.0375.0158.01.708348500.0
8-114.5933.6134.04789.01175.03134.01056.02.178258400.0
9-114.6034.8346.01497.0309.0787.0271.02.190848100.0

Last rows

longitudelatitudehousing_median_agetotal_roomstotal_bedroomspopulationhouseholdsmedian_incomemedian_house_value
16990-124.2241.7328.03003.0699.01530.0653.01.703878300.0
16991-124.2341.7511.03159.0616.01343.0479.02.480573200.0
16992-124.2340.8152.01112.0209.0544.0172.03.346250800.0
16993-124.2340.5452.02694.0453.01152.0435.03.0806106700.0
16994-124.2540.2832.01430.0419.0434.0187.01.941776100.0
16995-124.2640.5852.02217.0394.0907.0369.02.3571111400.0
16996-124.2740.6936.02349.0528.01194.0465.02.517979000.0
16997-124.3041.8417.02677.0531.01244.0456.03.0313103600.0
16998-124.3041.8019.02672.0552.01298.0478.01.979785800.0
16999-124.3540.5452.01820.0300.0806.0270.03.014794600.0